Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimistic signature verification for votes and order votes #14642

Merged
merged 50 commits into from
Oct 7, 2024

Conversation

vusirikala
Copy link
Contributor

@vusirikala vusirikala commented Sep 15, 2024

Description

This PR implements optimistic signature verification to reduce the time required to verify proposal votes and order votes.
When the optimistic signature verification feature flag is enabled, we will not verify these messages up front. We will accumulate the unverified messages, and when the accumulated voting power is higher than a threshold, we will aggregate all the signatures and verify the aggregated signature.
If the verification fails, we need to verify each individual signature. The ValidatorVerifier stores the list of authors that submitted bad messages, and will disable the optimistic signature verification for these malicious voters.

Type of Change

  • New feature
  • Bug fix
  • Breaking change
  • Performance improvement
  • Refactoring
  • Dependency update
  • Documentation update
  • Tests

Which Components or Systems Does This Change Impact?

  • Validator Node
  • Full Node (API, Indexer, etc.)
  • Move/Aptos Virtual Machine
  • Aptos Framework
  • Aptos CLI/SDK
  • Developer Infrastructure
  • Other (specify)

How Has This Been Tested?

Key Areas to Review

Checklist

  • I have read and followed the CONTRIBUTING doc
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I identified and added all stakeholders and component owners affected by this change as reviewers
  • I tested both happy and unhappy path of the functionality
  • I have made corresponding changes to the documentation

Copy link

trunk-io bot commented Sep 15, 2024

⏱️ 10h 19m total CI duration on this PR
Slowest 15 Jobs Cumulative Duration Recent Runs
execution-performance / single-node-performance 3h 27m 🟥🟩🟩🟩🟩 (+4 more)
forge-compat-test / forge 1h 19m 🟩🟩🟩🟩
forge-e2e-test / forge 58m 🟩🟩🟩🟩
test-target-determinator 45m 🟩🟩🟩🟩🟩 (+3 more)
execution-performance / test-target-determinator 39m 🟩🟩🟩🟩🟩 (+3 more)
check 29m 🟩🟩🟩🟩🟩 (+3 more)
general-lints 15m 🟩🟩🟩🟩🟩 (+4 more)
rust-cargo-deny 15m 🟩🟩🟩🟩🟩 (+4 more)
rust-move-tests 10m 🟩
rust-move-tests 10m 🟩
rust-move-tests 10m 🟩
rust-move-tests 10m 🟩
rust-move-tests 9m 🟩
rust-move-tests 9m 🟩
rust-move-tests 9m 🟩

🚨 2 jobs on the last run were significantly faster/slower than expected

Job Duration vs 7d avg Delta
execution-performance / test-target-determinator 8m 5m +51%
test-target-determinator 8m 5m +50%

settingsfeedbackdocs ⋅ learn more about trunk.io

@vusirikala vusirikala changed the base branch from main to satya/ledger_info_with_mixed_signatures September 15, 2024 01:17
@vusirikala vusirikala requested review from sitalkedia, danielxiangzl and igor-aptos and removed request for gregnazario, JoshLind and sasha8 September 15, 2024 01:17
@vusirikala vusirikala added the CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR label Sep 15, 2024

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

}

pub fn verified_voters(&self) -> impl Iterator<Item = &AccountAddress> {
self.verified_signatures.signatures().keys()
self.signatures.iter().filter_map(|(voter, signature)| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can shorten as

self.signatures.iter()
    .filter(|(_, sig)| sig.is_verified())
    .map(|(voter, _)| voter)

types/src/ledger_info.rs Show resolved Hide resolved

This comment has been minimized.

This comment has been minimized.

VoteStatus::NotEnoughVotes(li) => {
write!(
f,
"LI {} has {} verified votes {:?}, {} unverified votes {:?}",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please don't use debug format, it spams the log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Ok(_) => {
self.merge_signatures(&epoch_state.verifier, false);
self.filter_invalid_signatures(verifier, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in some sense we don't really need this since the aggregation already succeeds and returns?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we don't. But it still makes the data structure complete, and makes it easy in test cases to check that after aggregate_and_verify function is called, we have x number of verified signatures.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alin was telling me that even if the aggregation succeeds, it doesn't necessarily mean all individual signatures are valid, some validators can collude to produce two invalid signatures that can be aggregated to be valid. in that sense, for the sake of sanity, I think we probably shouldn't mark them as "valid"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

OrderVoteReceptionResult::VoteAdded(1)
);

vote_0.set_verified();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are we trying to do by setting it to verified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to make sure that the data structure works when it receives a mix of verified and unverified signatures.

pending_order_votes.insert_order_vote(&vote_2, &verifier, None),
OrderVoteReceptionResult::VoteAdded(2)
);
assert_eq!(verifier.pessimistic_verify_set().len(), 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead, here we should check if valid signatures are set to verified?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the check.

partial_sigs.add_signature(signers[0].author(), vote_0.signature().clone());

// same author voting for the same thing -> DuplicateVote
vote_0.set_verified();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same q here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to make sure that the data structure works when it receives a mix of verified and unverified signatures.

@@ -213,6 +213,54 @@ async fn test_no_failures() {
.unwrap();
}

#[tokio::test]
async fn test_faulty_votes() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how did you verify this works as expected? we don't have any logs showing pessimistic validators being added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added some println! states in validator_verifier and checked that the authors that are added to pessimistic verify set are being verified pessimistically from next time.
Screenshot 2024-10-04 at 1 37 19 PM
Screenshot 2024-10-04 at 1 36 22 PM

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

@vusirikala vusirikala enabled auto-merge (squash) October 7, 2024 22:21

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

Copy link
Contributor

github-actions bot commented Oct 7, 2024

✅ Forge suite realistic_env_max_load success on 9cf9e3af36ec803f0798eaff294bef8f0b309794

two traffics test: inner traffic : committed: 12878.40 txn/s, latency: 3093.44 ms, (p50: 2800 ms, p70: 3000, p90: 3600 ms, p99: 6300 ms), latency samples: 4896640
two traffics test : committed: 99.93 txn/s, latency: 3049.06 ms, (p50: 2600 ms, p70: 3000, p90: 4700 ms, p99: 7000 ms), latency samples: 1740
Latency breakdown for phase 0: ["QsBatchToPos: max: 0.265, avg: 0.226", "QsPosToProposal: max: 0.228, avg: 0.181", "ConsensusProposalToOrdered: max: 0.329, avg: 0.299", "ConsensusOrderedToCommit: max: 0.453, avg: 0.427", "ConsensusProposalToCommit: max: 0.753, avg: 0.727"]
Max non-epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 0.91s no progress at version 37499 (avg 0.21s) [limit 15].
Max epoch-change gap was: 0 rounds at version 0 (avg 0.00) [limit 4], 7.60s no progress at version 2653032 (avg 7.60s) [limit 15].
Test Ok

Copy link
Contributor

github-actions bot commented Oct 7, 2024

✅ Forge suite compat success on 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775 ==> 9cf9e3af36ec803f0798eaff294bef8f0b309794

Compatibility test results for 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775 ==> 9cf9e3af36ec803f0798eaff294bef8f0b309794 (PR)
1. Check liveness of validators at old version: 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775
compatibility::simple-validator-upgrade::liveness-check : committed: 13100.53 txn/s, latency: 2633.72 ms, (p50: 2100 ms, p70: 2200, p90: 6600 ms, p99: 9400 ms), latency samples: 487540
2. Upgrading first Validator to new version: 9cf9e3af36ec803f0798eaff294bef8f0b309794
compatibility::simple-validator-upgrade::single-validator-upgrading : committed: 7566.80 txn/s, latency: 3617.25 ms, (p50: 3600 ms, p70: 4400, p90: 4900 ms, p99: 5100 ms), latency samples: 139640
compatibility::simple-validator-upgrade::single-validator-upgrade : committed: 7118.85 txn/s, latency: 4477.87 ms, (p50: 4600 ms, p70: 4800, p90: 6500 ms, p99: 6700 ms), latency samples: 242120
3. Upgrading rest of first batch to new version: 9cf9e3af36ec803f0798eaff294bef8f0b309794
compatibility::simple-validator-upgrade::half-validator-upgrading : committed: 6375.48 txn/s, latency: 4467.00 ms, (p50: 4900 ms, p70: 5200, p90: 5900 ms, p99: 6200 ms), latency samples: 129360
compatibility::simple-validator-upgrade::half-validator-upgrade : committed: 7201.80 txn/s, latency: 4483.95 ms, (p50: 4700 ms, p70: 4800, p90: 6400 ms, p99: 6800 ms), latency samples: 245280
4. upgrading second batch to new version: 9cf9e3af36ec803f0798eaff294bef8f0b309794
compatibility::simple-validator-upgrade::rest-validator-upgrading : committed: 11283.26 txn/s, latency: 2399.45 ms, (p50: 2600 ms, p70: 2700, p90: 3000 ms, p99: 3200 ms), latency samples: 200100
compatibility::simple-validator-upgrade::rest-validator-upgrade : committed: 10845.77 txn/s, latency: 2909.91 ms, (p50: 2800 ms, p70: 3000, p90: 3600 ms, p99: 4600 ms), latency samples: 354080
5. check swarm health
Compatibility test for 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775 ==> 9cf9e3af36ec803f0798eaff294bef8f0b309794 passed
Test Ok

Copy link
Contributor

github-actions bot commented Oct 7, 2024

✅ Forge suite framework_upgrade success on 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775 ==> 9cf9e3af36ec803f0798eaff294bef8f0b309794

Compatibility test results for 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775 ==> 9cf9e3af36ec803f0798eaff294bef8f0b309794 (PR)
Upgrade the nodes to version: 9cf9e3af36ec803f0798eaff294bef8f0b309794
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1048.78 txn/s, submitted: 1051.51 txn/s, failed submission: 2.73 txn/s, expired: 2.73 txn/s, latency: 2875.44 ms, (p50: 2700 ms, p70: 3000, p90: 4600 ms, p99: 6300 ms), latency samples: 92340
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1056.07 txn/s, submitted: 1057.84 txn/s, failed submission: 1.77 txn/s, expired: 1.77 txn/s, latency: 2866.53 ms, (p50: 2700 ms, p70: 3000, p90: 4500 ms, p99: 6300 ms), latency samples: 95660
5. check swarm health
Compatibility test for 46bf19eb4f132b9d8fc19eff3f3334cdf9aa1775 ==> 9cf9e3af36ec803f0798eaff294bef8f0b309794 passed
Upgrade the remaining nodes to version: 9cf9e3af36ec803f0798eaff294bef8f0b309794
framework_upgrade::framework-upgrade::full-framework-upgrade : committed: 1068.75 txn/s, submitted: 1070.61 txn/s, failed submission: 1.86 txn/s, expired: 1.86 txn/s, latency: 2878.85 ms, (p50: 2700 ms, p70: 3000, p90: 4800 ms, p99: 6900 ms), latency samples: 92120
Test Ok

@vusirikala vusirikala merged commit 2deaf9f into main Oct 7, 2024
48 checks passed
@vusirikala vusirikala deleted the satya/osv_votes_and_order_votes branch October 7, 2024 23:20
@@ -78,6 +78,7 @@ pub struct ConsensusConfig {
// must match one of the CHAIN_HEALTH_WINDOW_SIZES values.
pub window_for_chain_health: usize,
pub chain_health_backoff: Vec<ChainHealthBackoffValues>,
// Deprecated
pub qc_aggregator_type: QcAggregatorType,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it's local config, why not remove it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CICD:run-e2e-tests when this label is present github actions will run all land-blocking e2e tests from the PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants